Text categorization using lexical chains
نویسنده
چکیده
In this report I present a prototype system for use in dynamic text categorization research. The system implements lexical chaining, as described in recent literature. On top of this is built a simple extension to use for automatically identifying one or several categories to place a given text in. The initial tests presented in this report does not give any useful results, however, it give rise to new questions and possible directions for future research of lexical chaining and its uses in text categorization. Along with the implementation, previous research and the lexicographic database WordNet are discussed.
منابع مشابه
Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus
Wednesday, June 15th 8:00 Conference Registration (Registration desk) 8:45 Session 1: Large-Scale Online Linguistic Resources (I) Chair: "Text Categorization Based on Subtopic Clusters" Francis Chik, Robert Luk, Korris Chung "Automatic Filtering of Bilingual Corpora for Statistical Machine Translation" Shahram Khadivi, Hermann Ney "The Role of Word Sense Disambiguation in Automated Text Categor...
متن کاملInteraction Chain Patterns of Online Text Construction with Lexical Cohesion
This study aims at arousing college students’ metacognition in detecting lexical cohesion during online text construction as WordNet served as a lexical resource. A total of 83 students were requested to construct texts through sequences of actions identified as interaction chains in this study. Interaction chains are grouped and categorized as a meaningful entity in order to investigate the st...
متن کاملرویکردی با ناظر در استخراج واژگان کلیدی اسناد فارسی با استفاده از زنجیرههای لغوی
Keywords are the main focal points of interest within a text, which intends to represent the principal concepts outlined in the document. Determining the keywords using traditional methods is a time consuming process and requires specialized knowledge of the subject. For the purposes of indexing the vast expanse of electronic documents, it is important to automate the keyword extraction task. S...
متن کاملUsing Genetic Algorithms with Lexical Chains for Automatic Text Summarization
Automatic text summarization takes an input text and extracts the most important content in the text. Determining the importance of information depends on several factors. In this paper, we combine two different approaches that have been used in the text summarization domain. The first one is using genetic algorithms to learn the patterns in the documents that lead to the summaries. The other o...
متن کاملMinimal training based semantic categorization in a voice activated question answering (VAQA) system
In this paper, we develop a knowledge based methodology that maps Automatic Speech Recognizer (ASR) transcriptions to predefined semantic categories in a Voice Activated Question Answering (VAQA) system. The proposed semantic categorization methodology, SemCat, uses a novel lexical chains/ontology based algorithm and relies heavily on customized but domain independent Natural Language Processin...
متن کامل